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Abstract 

A snap- stabilizing protocol, starting from any configuration, always behaves according to 
its specification. In this paper, we present a snap-stabilizing protocol to solve the message 
forwarding problem in a message-switched network. In this problem, we must manage resources 
of the system to deliver messages to any processor of the network. In this purpose, we use 
information given by a routing algorithm. By the context of stabilization (in particular, the 
system starts in an arbitrary configuration), this information can be corrupted. So, the existence 
of a snap-stabilizing protocol for the message forwarding problem implies that we can ask the 
system to begin forwarding messages even if routing information are initially corrupted. 

In this paper, we propose two snap-stabilizing algorithms (in the state model) for the follow- 
ing specification of the problem: 

• Any message can be generated in a finite time. 

• Any emitted message is delivered to its destination once and only once in a finite time. 

This implies that our protocol can deliver any emitted message regardless of the state of routing 
tables in the initial configuration. 

These two algorithms are based on the previous work of [21j . Each algorithm needs a 
particular method to be transform into a snap-stabilizing one but both of them do not introduce 
a significant overcost in memory or in time with respect to algorithms of |21j . 



*MIS Laboratory, Universite de Picardie Jules Verne, alain.cournier@u-picardie.fr 

^LIP6 - UMR 7606 Universite Pierre et Marie Curie - Paris 6 & INPJA Rocquencourt, swan.dubois@lip6.fr 
"^MIS Laboratory, Universite de Picardie Jules Verne, vincent.villain@u-picardie.fr 



1 



1 Introduction 



The quality of a distributed system depends on its fault-tolerance. Many fault-tolerant schemes have 
been proposed. For instance, self- stabilization (|S|) allows to design a system tolerating arbitrary 
transient faults. A self-stabilizing system, regardless of the initial state of the system, is guaranteed 
to converge into the intended behavior in a finite time. An other paradigm called snap-stabilization 
has been introduced in [3l |2j. A snap-stabilizing protocol guarantees that, starting from any 
configuration, it always behaves according to its specification. In other words, a snap-stabilizing 
protocol is a self-stabilizing protocol which stabilizes in time unit. 

In a distributed system, it is commonly assumed that each processor can exchange messages 
only with its neighbors (i.e. processors with which it shares a communication link) but processors 
may need to exchange messages with any processor of the network. To perform this goal, processors 
have to solve two problems: the determination of the path which messages have to follow in the 
network to reach their destinations (it is the routing problem) and the management of network 
resources in order to forward messages (it is the message forwarding problem). 

These two problems received a great attention in literature. The routing problem is studied for 
example in [H [U [131 US IZH [3Ql [20l [231 [2S] and self-stabilizing approach can be found (directly or 
not) in pU [181 0Q2]. The forwarding problem has also been well studied, see [l2l [21] l22l 126] l27l [28] 
for example. As far we know, the message forwarding problem was never directly studied with a 
snap-stabilizing approach (note that the protocol proposed by [17] can be used to perform a self- 
stabilizing forwarding protocol for dynamic networks since it is guaranteed that routing tables 
remain loop-free even if topological changes are allowed). 

Informally, a message forwarding protocol allows any processor of the network to send messages 
to any destination of the network knowing that a routing algorithm computes the path that messages 
have to follow to reach their destinations. Problems come of the following fact: messages traveling 
through a message-switched network (|24j) must be stored in each processor of their path before 
being forwarded to the next processor on this path. This temporary storage of messages is performed 
with reserved memory spaces called buffers. Obviously, each processor of the network reserves only 
a finite number of buffers for the message forwarding. So, it is a problem of bounded resources 
management which exposes the network to deadlocks and livelocks if no control is performed. 

In this paper, we focus on message forwarding protocols which deal the problem with a snap- 
stabilizing approach. The goal is to allow the system to forward messages (without looses) regardless 
of the state of the routing tables. Obviously, we need that theses routing tables repair themselves 
in a finite time. So, we assume the existence of a self-stabilizing protocol to compute routing tables 
(see pUdM). 

In the following, a valid message is a message which has been generated by a processor. As a 
consequence, an invalid message is a message which is present in the initial configuration. We can 
now specify the problem. We propose a specification of the problem where message duplications 
(i.e. the same message reaches its destination many time while it has been generated only once) 
are forbidden: 

Specification 1 (SV) Specification of message forwarding problem forbidding duplication. 

• Any message can be generated in a finite time. 

• Any valid message is deliver to its destination once and only once in a finite time. 

In this paper, we investigate the possibility to transform two known message forwarding pro- 
tocols (|21j) into snap-stabilizing ones. We use a different scheme for both of them but we prove 
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that these two schemes do not significantly modify time and space complexities of these protocols. 
Consequently, the main contribution of this paper is to show that it is possible to provide stronger 
safety properties without significant overcost. 

The sequel of this paper is organized as follows: we present first our model (section [2]) . We 
quickly survey the seminal work of [21] in section [3j Then we give, prove, and analyze our two 
solutions (sections H] and E]) . Finally, we conclude by some remarks and open problems (section [6]) . 

2 Model and definitions 

We consider a network as an undirected connected graph G = (V, E) where V is a set of processors 
and E is the set of bidirectional asynchronous communication links. In the network, a communi- 
cation link (p, q) exists if and only if p and q are neighbors. Every processor p can distinguish all 
its links. To simplify the presentation, we refer to a link (p, q) of a processor p by the label q. We 
assume that the labels of p are stored in the set N p . 

We also use the following notations: respectively, n is the number of processors, A the maximal 
degree, and D the diameter of the network. If p and q are two processors of the network, we denote 
by dist(p, q) the length of the shortest path between p and q (i.e. the distance between p and q). In 
the following, we assume that the network is identified, i.e. each processor have an identity which 
is unique on the network. Moreover, we assume that all processors know the set I of all identities 
of the network. 

2.1 State model 

We consider the classical local shared memory model of computation (see [23]) in which commu- 
nications between neighbors are modeled by direct reading of variables instead of exchange of 
messages. 

In this model, the program of every processor consists in a set of shared variables (henceforth, 
referred to as variables) and a finite set of actions. A processor can write to its own variables 
only, and read its own variables and those of its neighbors. Each action is constituted as follows: 
< label >::< guard > — >< statement >. The label is a name to refer to the rule in the discussion. 
The guard of an action in the program of p is a Boolean expression involving variables of p and its 
neighbors. The statement of an action of p updates one or more variables of p. An action can be 
executed only if its guard is satisfied. 

The state of a processor is defined by the value of its variables. The state of a system is the 
product of the states of all processors. We refer to the state of a processor and the system as a 
(local) state and (global) configuration, respectively. We note C the set of all configurations of the 
system. 

Let 7 € C and A an action of p (p € V). A\s enabled for p in 7 if and only if the guard of A is 
satisfied by p in 7. Processor p is enabled in 7 if and only if at least one action is enabled at p in 
7. Let a distributed protocol V be a collection of actions denoted by — >, on C. An execution of a 
protocol V is a maximal sequence of configurations V = 7o7i---7i7i+l-" such that, Vi > 0, 7, — » 7^+1 
(called a step) if 7i+i exists, else 7^ is a terminal configuration. Maximality means that the sequence 
is either finite (and no action of V is enabled in the terminal configuration) or infinite. All executions 
considered here are assumed to be maximal. £ is the set of all executions of V . 

As we already said, each execution is decomposed into steps. Each atomic step is composed 
of three sequential phases: (i) every processor evaluates its guards, (ii) a daemon chooses some 
enabled processors, (iii) each chosen processor executes one of its enabled actions. When the 
three phases are done, the next step begins. A daemon can be defined in terms of fairness and 



3 



distribution. There exists several kinds of fairness assumption. Here, we present the strong fairness, 
weak fairness, and unfairness assumptions. Under a strongly fair daemon, every processor that 
is enabled infinitely often is chosen by the daemon infinitely often to execute an action. When a 
daemon is weakly fair, every continuously enabled processor is eventually chosen by the daemon. 
Finally, the unfair daemon is the weakest scheduling assumption: it can forever prevent a processor 
to execute an action except if it is the only enabled processor. Concerning the distribution, we 
assume that the daemon is distributed meaning that, at each step, if one or several processors are 
enabled, then the daemon chooses at least one of these processors to execute an action. 

We consider that any processor p is neutralized in the step 7$ — » "fi+i if p was enabled in ji and 
not enabled in 7^+1, but did not execute any action in 7$ — ► 7^+1 . To compute the time complexity, 
we use the definition of round (introduced in [TO] and modified by [3]). This definition captures the 
execution rate of the slowest processor in any execution. The first round of V € 6 , noted V , is the 
minimal prefix of T containing the execution of one action or the neutralization of every enabled 
processor from the initial configuration. Let T" be the suffix of T such that T = T'T" . The second 
round of T is the first round of T" , and so on. 

2.2 Message-switched networks 

Today, most of computer networks use a variant of the message-switching method (also called 
store- and- forward method). It is why we have choose to work with this switching model. In this 
section, we are going to present this method (see [21] for a detailed presentation). 

Each processor has B buffers for temporarily storing messages. The model assumes that each 
buffer can store a whole message and that each message needs only one buffer to be stored. The 
switching method is modeled by four types of moves: 

1. Generation: when a processor sends a new message, it "creates" a new message in one of 
its empty buffers. We assume that the network may allow this move as soon as at least one 
buffer of the processor is empty. 

2. Forwarding: a message m is forwarded (copied) from a processor p to an empty buffer in 
the next processor q on its route (determined by the routing algorithm). We assume that the 
network may allow this move as soon as at least one buffer buffer of the processor is empty. 

3. Consumption: A message m occupying a buffer in its destination is and delivered to this 
processor. We assume that the network may always allow this move. 

4. Erasing: a message m is erased from a buffer. We assume that the network may allow this 
move as soon as the message is forwarded at least one time or delivered to its destination. 

2.3 Stabilization 

In this section, we give formal definitions of self- and snap-stabilization using notations introduced 
inEU 

Definition 1 (Self-Stabilization |8j) Let T be a task, and St a specification of T . A protocol 
V is self- stabilizing for St if and only ifVT 6 8, there exists a finite prefix V = (70,71, • ■-,7z) ofT 
such that any executions starting from 7; satisfies St- 

Definition 2 (Snap-Stabilization [2, 3j) Let T be a task, and St a specification of T . A pro- 
tocol V is snap- stabilizing for St if and only ifVT € E, T satisfies St- 
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This definition has the two following consequences. We can see that a snap-stabilizing protocol 
for St is a self-stabilizing protocol for St with a stabilization time of time unit. A common 
method used to prove that a protocol is snap-stabilizing is to distinguish an action as a "starting 
action" (i.e. an action which initiates a computation) and to prove the following property for every 
execution of the protocol: if a processor requests it, the computation is initiated by a starting action 
in a finite time and every computation initiated by a starting action satisfies the specification of 
the task. We use these two remarks to prove snap-stabilization of our protocol in the following of 
this paper. 

3 Fault-free protocols 

In this section, we survey the seminal work of |2lJ^l . Remind that this work assume that routing 
tables are correct in the initial configuration. To simplify the presentation, we assume that the 
routing algorithm induces only minimal paths in number of edges. 

We have seen in section [2^21 that . by default, the network always allows message moves between 
buffers. But, if we do no control on these moves, the network can reach unacceptable situations 
such as deadlocks, livelocks or message losses. If such situations appear, specifications of message 
forwarding are not respected. 

In order to avoid deadlocks, we must define an algorithm which permits or forbids various moves 
in the network (functions of the current occupation of buffers). A such algorithm is a controller. If 
a controller C ensure the following property: in any execution, C prevents the network to reach a 
deadlock, then C is a deadlock-free controller. 

Livelocks can be avoided by fairness assumptions on the controller for the generation and the 
forwarding of messages. Message losses are avoided by the using of identifier on messages. For 
example, one can use the concatenation of the identity of the source and a two-value flag in order 
to distinguish two consecutive identical messages generated by the same processor for a Destination 
d (since all messages follow the same path). 

Then, a deadlock-free controller which prevents also livelocks and message losses satisfies the 
specification of the message forwarding problem. 

In the case where routing table are initially correct, [2TJ introduced a generic method to design 
deadlock-free controllers. It consists to restrict moves of messages along edges of an oriented graph 
BG (called buffer graph) defined on the network buffers. Then, it is easy to see that cycles on 
BG can lead to deadlocks. So, authors show that, if BG is acyclic, they can define a deadlock-free 
controller on this buffer graph. In the sequel of this section, we present the two buffer graph which 
we use in our snap-stabilizing protocols. 

"Destination-based" buffer graph. In this scheme, we assume that the routing algorithm 
forwards all packets of Destination d via a directed tree rooted in d. Each processor p of the 
network has a buffer b p (d) for each possible Destination d (called the target of b p (d)). The buffer 
graph has n connected components, each of them containing all the buffers which shared their 
target. The connected component associated to the target d is isomorphic to T^. The reader can 
find an example of a such graph in Figure [TJ 

Since each connected component of this graph is a tree, this oriented graph is acyclic. Conse- 
quently, [21J allows us to define a deadlock-free controller on this graph. Note that this scheme use 
n buffers per processor. So, we need n 2 buffers on the whole network. 

1 The reader is referred to [21] to find a much detailed description of this work. 
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Figure 1: Example of a "destination-based" buffer graph (on the right) on the network of the left. 



Rank 




a b c d e 



Figure 2: Example of a "distance-based" buffer graph (on the right) on the network of the left. 

"Distance-based" buffer graph. In this scheme, each processor have D+l buffers ranked from 
1 to D + 1 (remind that D is the diameter of the network). New messages are always generated 
in the buffer of rank 1 of the sending processor. When a message occupying a buffer of rank i is 
forwarded to a neighbor q, it is always copied in the buffer of rank i + 1 of q. We need -D + l buffers 
per processor since, in the worst case, there are D forwarding of a message between its generation 
and its consumption. The reader can find an example of such a graph in Figure [2j 

Since messages always "come upstairs" the buffer rank, this oriented graph is acyclic. Conse- 
quently, [21J allows us to define a deadlock-free controller on this graph. Note that this scheme use 
D + l buffers per processor. So, we need n{D + 1) buffers on the whole network. 

4 First protocol 

4.1 Informal description 

The main idea of this section is to adapt the " destination-based" scheme (see Section in order to 
tolerate the corruption of routing tables in the initial configuration. To perform this goal, we assume 
the existence of a self-stabilizing silent {i.e. no actions are enabled after convergence) algorithm A to 
compute routing tables which runs simultaneously to our message forwarding protocol. Moreover, 
we assume that A has priority over our protocol (i.e. a processor which has enabled actions for 
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Figure 3: Example of our buffer graph (on the right) for Destination b on the network (on the left). 

both algorithms always chooses the action of A). This guarantees us that routing tables are correct 
and constant in a finite time. To simplify the presentation, we assume that A induces only minimal 
paths in number of edges. We assume that our protocol can have access to the routing table via 
a function, called nextH op p {d) . This function returns the identity of the neighbor of p to which p 
must forward messages of Destination d. 

We now describe our buffer graph adapted from the "destination-based" one. Our buffer graph is 
composed of n connected components, each associated to a destination d and based on the oriented 
tree (remind that is the tree induced by routing table for Destination d). Consequently, we 
can present only one connected component, associated to a destination noted d (others are similar). 
We use two buffers per processor for Destination d. The first one, noted bufR p (d) (for processor 
p), is reserved to the reception of messages whereas the second one, noted bufE p (d), is used to 
emit messages (see Figure [3|). This scheme allows us to control the advance of messages. Indeed, 
we allow a message to be forwarded from bufR p (d) to bufE p (d) if and only if the message is only 
present in bufR p {d) and we erase it simultaneously. In this way, we can control the consequences 
of routing tables moves on messages (duplication or merge which can involve message losses). 

To avoid livelocks, we use a fair scheme of selection of processors allowed to forward or to emit a 
message for each reception buffer. We can manage this fairness by a queue of requesting processors. 
Finally, we use a specific flag to prevent message losses. It is composed of the identity of the last 
processor cross over by the message and a color which is dynamically given to the message when it 
reaches an emission buffer. In order to distinguish a such incoming message of these contained in 
reception buffers of neighbors of the considered processor, we give to this incoming message a color 
which is not carried by a such message. It is why a message is considered as a triplet (m,p, c) in our 
algorithm where m is the useful information of the message, p is the identity of the last processor 
crossed over by the message, and c is a color (a natural integer between and A). 

We must manage a communication between our algorithm and processors in order to know 
when a processor have a message to send. We have chosen to create a Boolean shared variable 
request p (for any processor p). Processor p can set it at true when it is at false and when p has a 
message to send. Otherwise, p must wait that our algorithm sets the shared variable to false (that 
is done when a message is generated). 

The reader can find a complete example of the execution of our algorithm in Figure HI Diagram 
(iV) shows the network and diagram (0) shows the initial configuration for the connected component 
associated to b of the buffer graph. We observe that A = 3, so we need 4 different values for the 
variable color, we have chosen to represent them by a natural integer in {0,1,2,3}. Remark that 
routing tables are incorrect (in particular there exists a cycle involving buffers of a and c) and that 
there exists an invalid message m! in the reception buffer of b (its color is 0). Then, Processor c 
emits a message m (its color is 0) in the reception buffer of c to obtain configuration (1). When the 
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message m is forwarded to the emission buffer of c, we associate it the color 1 (since is forbidden, 
see configuration (2)). During the next step, message m is forwarded to the reception buffer of 
a (remark that it keeps its color) and c emits (in its reception buffer) a new message m! which 
has the same useful information as the invalid message present on b. So, we obtain configuration 
(3). Message m can now be erased from the emission buffer of c and m! can be forwarded into 
this buffer (we associate it the color 2). These two steps lead to configuration (4). Assume that 
routing tables are repaired during the next step. Simultaneously, processor a is allowed to forward 
m into its emission buffer. We obtain configuration (5). Remark that the use of color forbids the 
merge between the two messages which have m' for useful information. Then, the system is able 
to deliver these three messages by the repetition of moves that we have described: 

• forwarding from reception buffer to emission buffer of the same processor. 

• forwarding from emission buffer to reception buffer of two processors. 

• erasing from emission buffer or delivering. 

The sequence of configuration (6) to (12) shows an example of the end of our execution. 

4.2 Algorithm 

We now present formally our protocol in Algorithm [TJ We call it SSM.TV\ for 5nap-5tabilizing 
A'lessage .Forwarding Protocol 1. In order to simplify the presentation, we write the algorithm for 
Destination d only. Obviously, each destination of the network needs a similar algorithm. Moreover, 
we assume that all these algorithms run simultaneously (as they are mutually independent, this 
assumption has no effect on the provided proof). 

4.3 Proof of correctness 

In order to simplify the proof, we introduce a second specification of the problem. This specification 
allows message duplications. 

Specification 2 (SV') Specification of message forwarding problem allowing duplication. 

• Any message can be generated in a finite time. 

• Any valid message is deliver to its destination in a finite time. 

In this section, we prove that SSMJ-Vi is a snap-stabilizing message forwarding protocol for 
specification SV. For that, we are going to prove successively that: 

1. SShATV\ is a snap-stabilizing message forwarding protocol for specification SV' if routing 
tables are correct in the initial configuration (Lemmas [H El [3] and Proposition [T]). 

2. SShATV\ is a self-stabilizing message forwarding protocol for specification SV' even if routing 
tables are corrupted in the initial configuration (Proposition [2]). 

3. SSM.TV\ is a snap-stabilizing message forwarding protocol for specification SV even if rout- 
ing tables are corrupted in the initial configuration (Lemmas HI [5] and Theorem [1]) . 



8 



o 

b 

(N) 



(m',c,0) 



(0) 



-> (m,o,0) 



(m'.cO) 



(1) 



(m,c,1) 



1(m',c,2) 



(m'.cO) 



(4) 



-> (m,a,1) 



(m',c,2) f 



(7) 



(m.c.O) f^f 



1 (m,c,1) 



(m',c,0) 



(2) 



(m,a,1) 



(m',o,2) 



(m',c,0) r 



(5) 



(m,c,0) 



(m',b,2) 



(8) 



(m,b,1) 



(m,c,1) 



->|(m',c,0) 
(m,c,1) 



(m'.cO) 



(m,a,1)| 



(m',b,0) 



a 

(m',c,2) 



(3) 



>| (m,a,1) 
(m',c,2) 



(6) 



(m,c,0) ^' 



r 3 ^ 

(m.c.O) 



(9) 



(10) 



(11) 



(12) 



Figure 4: An example of execution of our first algorithm. 
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Algorithm 1 (SSMTVi ): Message forwarding protocol for Processor p with Destination d. 



Data: 

- n: natural integer equals to the number of processors of the network. 

- I = {0, ...,n — 1}: set of processor identities of the network. 

- N p : set of neighbors of p. 

- A: natural integer equals to the maximal degree of the network. 
Message: 

- (to, q, c) with to useful information of the message, q G N p U {p} identity of the last processor crossed 
over by the message, and c G {0, A} a color. The message destination is the buffer index. 

Variables: 

- bufR p (d), bufE p (d): buffers which can contain a message or be empty (denoted by e). 
Input/Output: 

- requestp. Boolean. The higher layer can set it to true when its value is false and when there is a 
waiting message. We consider that this waiting is blocking. 

Macros: 

- nextMessage p : gives the message waiting in the higher layer. 

- nextDestination p : gives the destination of nextMessage p if it exists, null otherwise. 
Procedures: 

- nextHop p (d): neighbor of p given by the routing algorithm for Destination d. 

- choice p (d): fairly chooses one of the processors which can forward or generate a message in 
bufR p (d), i.e. choice p (d) satisfies predicate (choice p (d) G N p A buf E choicep ^(d) — (to, q, c)A 
nextHop choicep ^(d) = p) V (choice p (d) = p A request p ). We can manage this fairness with a queue of 
length A + 1 of processors which satisfies the predicate. 

- deliver p (m): delivers the message to to the higher layer of p. 

- color p (d): gives a natural integer c between and A such as Vg G N p , bufR q (d) does not contain a 
message with c as color. 

Rules: 

/* Rule for the generation of a message */ 

(Ri) :: requestp A (nextDestination p — d) A (bufR p (d) = e) A (choice p (d) = p) — ► bufR p (d) := 

(nextMessage p , p,0); requestp := false 

/* Rule for the internal forwarding of a message */ 

(Ha) :: (bufE p (d) = e) A (bufR p (d) - (to, q, c)) A ((q = p) V (bufE q (d) ? (to, q', c))) — > bufE p (d) := 

(m,p,color p (d));bufR p (d) := e 

/* Rule for the forwarding of a message */ 

(_R 3 ) :: (bufRp(d) = e)/\(choice p (d) = s)A(s ^ p)A(bufE s (d) = (m,q,c)) — ► bufR p (d) := (m, s^) 1 
/* Rule for the erasing of a message after its forwarding */ 

(R 4 ) :: (bufE p (d) = {m,q,c)) A (p ^ d) A (bufR nextHopp{d) (d) = (m,p,c)) A (Vr G 
Np\{nextHop p (d)}, bufR r (d) ^ (m,p,c)) — ► bufE p (d) := e 
/* Rule for the erasing of a message after its duplication */ 

(_R 5 ) :: (bufRp(d) — (m,q,c)) A (bufE q (d) — (m,q',c)) A (nextHop q (d) =/= p) — ► bufR p (d) := e 

/* Rule for the consumption of a message */ 

(Re) :: (bufE p (p) — (m,q,c)) — ► deliver p (m);buf E p (p) := e 

1 The fact that q may be different of s implies that the message was in the system at the initial configuration. 
We could locally delete this message but this does not improve the performance of SSMTVx . 
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Figure 5: Examples of caterpillar associated to m on p (from left to right: two of type 1, one of 
type 2 and one of type 3). 



In this proof, we consider that the notion of message is different from the notion of useful 
information. This implies that two messages with the same useful information generated by the 
same processor are considered as two different messages. We must prove that the algorithm does 
not lose one of them thanks to the use of the flag. Let 7 be a configuration of the network. We say 
that a message m is existing in 7 if at least one buffer contains m in 7. We say that m is existing 
on p in 7 if at least one buffer of p contains m in 7. 

Definition 3 (Caterpillar of a message m) Let m be a message of Destination d existing on a 
processor p in a configuration 7. We define a caterpillar associated to m as the longest sequence of 
buffers that satisfies one of the three definitions below: 

1. Caterpillar of type 1: (bufR p (d) = (m,q,c)) A {{buf E q {d) ^ (m,q',c)) V (q = p)). 

2. Caterpillar of type 2: (bufE p (d) = (m,q,c)) A {buf R nextHopp{d) (d) / (m,p,c)). 

3. Caterpillar of type 3: (bufE p (d) = (m,q',c)) A 3q G N p , (bufR q (d) = (m,p,c)). 

The reader can find in Figure[5]an example for each type of caterpillar. Remark that an emission 
buffer can belong to several caterpillars of type 3. 

Lemma 1 Let 7 be a configuration in which routing tables are correct. Let m be a message existing 
on p in 7. Under a weakly fair daemon, the execution of SSMFVi products in a finite time one 
of the following effects for any caterpillar of type 1 associated to m: 

• m is delivered to its destination. 

• the caterpillar disappeared on p and there exists a caterpillar of type 1 associated to the same 
message on nextHop p (d). 

Proof. Let 7 be a configuration in which routing tables are correct. Let m (of Destination d) 
be a message existing in 7. Let C = bufR p (d) be a caterpillar of type 1 associated to m. Denote 
by 5 the distance between p and d (5 = dist(p,d)). We are going to prove the result by induction 
on 5. We define the following predicate: 

(P5): if C = bufRp{d) is a caterpillar of type 1 associated to m such that dist(p, d) = 5, then, 
under a weakly fair daemon, the execution of SSM^FVi products one of the following effect in a 
finite time: 

• m is delivered to d. 
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• C disappeared on p and there exists a caterpillar of type 1 associated to the same message 
on nextHop p (d). 

Initialization: We are going to prove that (Pq) is true. 

Let C = bufRp(d) be a caterpillar of type 1 associated to m such that dist(p,d) = 0. This 
implies that p = d. Let be bufR p (d) = (m,q,id). We must distinguish two cases : 

Case 1: bufE p {d) ^ e. 

The rule (Re) is enabled for the processor p. We can observe that this rule can not be 
neutralized. Since we assumed a weakly fair daemon, we obtain that p executes (Re) 
in a finite time. We can then consider the case 2 since this rule erases the content of 
bufE p (d). 

Case 2: bufE p (d) = e. 

By the definition of a caterpillar of type 1, (R<i) is enabled for p. This rule can be 
neutralized if and only if bufE q (d) is occupied by (m,q',id). This is impossible by the 
construction of color q (d). Since we assume a weakly fair daemon, we obtain that p 
executes (R2) in a finite time. C disappears and a new caterpillar of type 2 appears in 
bufEp(d). By the same reasoning of the case 1, we can say that p executes (Re) in a 
finite time. This implies that m is delivered to d. 

We proved that (Po) is true. 

Induction: Let 8 > 1. We assume that (Ps-i) is true. We are going to prove that then (P$) is 
true. 

Let C = bufR p (d) be a caterpillar of type 1 associated to m such that dist(p, d) = S. Let be 
bufRp(d) = (m,q,id). We must distinguish two cases: 

Case 1: bufE p (d) / e. 

Let be r = nextHop p (d). 

Case 1.1: bufE p (d) is occupied by a caterpillar C of type 2. 

By the definition of a caterpillar of type 2, either (R3) or (Ri) is enabled on r if 
and only if bufR r (d) = e. 

Case 1.1. a: If bufR r (d) = e, then r executes (R3) or (-Ri) (since we assumed a 
weakly fair daemon and these rules cannot be neutralized). The result of this 
execution depends on the value of choice r (d): 

• If choice r (d) = P, then C becomes a caterpillar of type 3. We are now in the 
case 1.2. 

• If choice r (d) / p, then a message (m', choice r (d),id') is forwarded in bufR r (d). 
So, C remains a caterpillar of type 2 and we are in the case 1.1. b. It is im- 
portant to remark that the fairness of choice r (d) guarantees us that this case 
cannot appear infinitely. 

Case 1.1. b: If bufR r (d) = (m' ,q' ,id'), then we can distinguish two cases: 

• If bufR r (d) belongs to at least one caterpillar of type 3, we can apply the 
reasoning of the case 1.2 to bufE q i(d) and conclude that bufR r (d) belongs to 
a caterpillar of type 1 in a finite time. 

• If bufR r (d) belongs to a caterpillar of type 1, we can say that bufR r (d) be- 
comes empty in a finite time by application of (P§—\) (dist(r, d) = dist(p, d) — 
1 = 5 — 1 since routing tables are correct). Then, we are on the case 1.1. a. 
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We can conclude that bufE p (d) belongs to a caterpillar of type 3 associated to m 
in a finite time. So, we are on the case 1.2. 
Case 1.2: bufE p (d) belongs to at least one caterpillar of type 3. 

Case 1.2. a: bufE p (d) belongs to at least two caterpillars of type 3. 

This implies that there exists x € N p \{r}, bufR x (d) = (m,p,id). The processor 
x is enabled by (-R5) infinitely (since routing tables are correct and p cannot 
erase bufE p {d) by the construction of (-R4)). Since we assumed a weakly fair 
daemon, (m,p,id) is erased from bufR x (d) in a finite time. We can repeat 
this reasoning until bufE p (d) belongs to only one caterpillar of type 3 since 
the construction of (R3) guarantees us that it is impossible to create a new 
caterpillar of type 3 involving bufE p {d). So, we are on the case 1.2.b. 

Case 1.2.b: bufE p (d) belongs to only one caterpillar of type 3. 

By the definition of a caterpillar of type 3, we can say that p is enabled for 
(R4). The construction of (-R3) guarantees us that it is impossible to create 
a new caterpillar of type 3 involving bufE p (d), also (-R3) is not neutralized. 
As we assumed a weakly fair daemon, p executes (-R4) in a finite time. Then, 
bufE p (d) is empty in a finite time, we are in the case 2. 

We can conclude the case 1 by the following affirmation : we are in the case 2 in a finite 
time. 

Case 2: bufE p (d) = e. 

By the definition of a caterpillar of type 1, p is enabled by (-^2)- By the construction 
of color q (d) and of (-R2) (for q), (R2) cannot neutralized for p. Since we assumed a 
weakly fair daemon, we can say that p executes (R2) in a finite time. This implies that 
C disappears and a new caterpillar C of type 2 associated to m appears. We can now 
apply the reasoning of the case 1 to deduce that C becomes a caterpillar of type 1 on r 
in a finite time. 

We have proved that (Ps) is true, that ended this proof. □ 

Lemma 2 If routing tables are correct, every processor can generate a first message (i.e. it can 
execute (Ri) ) in a finite time under a weakly fair daemon. 

Proof. Let p be a processor which has a message m (of Destination d) to send. As p has a 
waiting message, we have request p = true whatever its value in the initial configuration. We must 
now study two cases: 

Case 1: bufR p (d) = e. 

The processor p executes either (-R3) or (.Ri) in a finite time (since we assumed a weakly 
fair daemon and these rules cannot be neutralized). The result of this execution depends on 
the value of choice p (d): 

• If choice p (d) = p, then p executes (-Ri) in a finite time, we obtain the result. 

• If choice p {d) 7^ p, then p executes (R3) in a finite time. Consequently, bufR p (d) is 
occupied by a caterpillar of type 3. So, we are in the case 2.1. Note that the fairness of 
choice p (d) guarantees us that this case cannot appear infinitely. 

Case 2: bufR p (d) = (m',q,id). 
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Case 2.1: bufR p {d) belongs to a caterpillar C of type 3. 

We can apply the reasoning of the case 1.2 of the proof of Lemma [U to bufE q (d) and 
conclude that C becomes a caterpillar of type 1 in a finite time. We are now in the case 
2.2. 

Case 2.2: bufR p (d) belongs to a caterpillar C of type 1. 

We can apply Lemma Q] to C and say that bufR p (d) becomes empty in a finite time. 
We are now in the case 1. 

By the remark of the case 1, this reasoning is finite, that proves the result. □ 

Lemma 3 If a message m is generated by SSM.TV\ in a configuration in which routing tables 
are correct, SSMJ-Vi delivers m to its destination in a finite time under a weakly fair daemon. 

Proof. Assume that routing tables are correct when SSMJ-Vi accepts a message m (of 
Destination d) on Processor p. This implies that p generated m executing rule (-Ri)- This rule 
leads to the creation of a caterpillar of type 1 associated to m in bufR p (d). Since routing tables 
are assumed correct and constant, the result follows from dist(p,d) + 1 applications of Lemma [TJ 
□ 

Proposition 1 SSMTV\ is a snap- stabilizing message forwarding protocol for SV' if routing 
tables are correct in the initial configuration. 

Proof. Assume that routing tables are correct in the initial configuration. To prove that 
SSMJ~V± is a snap-stabilizing message forwarding protocol for specification SV', we must prove 
that : 

1. If a processor p requests to send a message, then the protocol is initiated by at least one 
starting action on p in a finite time. In our case, the starting action is the execution of (Ri). 
Lemma [2] proves this property. 

2. After a starting action, the protocol is executed according to SV'. If we consider that (-Ri) 
have been executed at least one time, we can prove that: 

• The first property of SV' is always satisfied (following Lemma [2] and the fact that the 
waiting for the sending of new messages is blocking). 

• The second property of SV' is always satisfied (following Lemma [3]) . 

Consequently, we deduce the proposition. □ 

Proposition 2 SSM.TV\ is a self- stabilizing message forwarding protocol for SV' ( even if routing 
tables are corrupted in the initial configuration) when A runs simultaneously. 

Proof. Remind that A is a self-stabilizing silent algorithm for computing routing tables running 
simultaneously to SS M.TV\ . Moreover, we assumed that A has priority over SSM.TV\ (i.e. a 
processor which have enabled actions for both algorithms always chooses the action of A). This 
guarantees us that routing tables are correct and constant in a finite time regardless of the initial 
state. 

By Proposition [TJ SSAiJ-'Vi is a snap-stabilizing message forwarding protocol for specification 
SV' when it starts from a such configuration. Consequently, we obtain the proposition. □ 
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Lemma 4 Under a weakly fair daemon, SSM.TV\ does not delete a valid message without deliver 
it to its destination even if A runs simultaneously. 

Proof. By contradiction, let m be a valid message which is deleted without being delivered to 
its destination. 

By the construction of the rule (R2), this cannot be the result of an internal forwarding since 
the message is sequentially copied in bufE p (d) and erased from bufR p (d). 

By the construction of rules (-R5) and (R4), this cannot be the result of the execution of (-R5) 
(since we are guaranteed that m is in bufE q (d) and cannot be erased from this buffer simultane- 
ously) . 

By the construction of rules (-R4) and (R2), rn cannot be erased from buf R nextHopp ^(d) in 
the step in which it is erased from bufE p (d). 

Since we have seen that a simultaneous erasing is impossible, the hypothesis implies that m is 
erased from a buffer bufE p (d) without being copied in another buffer. 

The only rule which erases a message from bufE p (d) and does not deliver m is (R4). If a pro- 
cessor p executes this rule, then we have bufE p (d) = (m, q, id) and buf 'R nex tHop (d)(d) = (m,p, id). 
Assume that the message contained by buf ' R n extHop p (d)(d) is not the result of the application of 
rule (-R3) on bufE p (d). If this message was in buf R n extHop p (d)(d) before m came in buf'E p (d), 
we obtain a contradiction with the definition of color p {d). This implies that this message came 
in bufR next H op (d)(d) after m came in bufE p (d). Then, the construction of (-R3) allows us to say 
that buf ' RnextHop p (d)(d) contains a message (m,q',id) with q' ^ p (since we have supposed that the 
message does not come from buf E p (d)). We obtain a contradiction. We can conclude that, when 
we have bufE p (d) = (m,q,id) and buf R n extHop p (d)(d) = (m,p,id), the message m has been copied 
at least one time. This result contradicts the existence of m. □ 

Lemma 5 Under a weakly fair daemon, SSMTVi never duplicates a valid message even if A 
runs simultaneously. 

Proof. Since the emission of a message creates one caterpillar of type 1 by the construction of 
the rule (.Ri), it remains to prove the following property : if a caterpillar of type 1 associated to 
a message m is present on a processor p and this message is erased from all buffers of p, then only 
one neighbor of p contains a caterpillar of type 1 associated to m or m have been delivered to its 
destination. 

Let C be a caterpillar of type 1 associated to a message m (of Destination d) on a processor p. 
Since (R5) is not enabled for p (by definition of a caterpillar of type 1), m is erased from bufR p (d) 
by (-^2)- So, m is still present on p (since it has been copied in buf E p (d)). Then, we have two 
cases to observe: 

Case 1: p = d. 

The only rule for erasing m which can be enabled is (Re)- This rule delivers m to its 
destination. 

Case 2: p / d. 

The only rule for erasing m which can be enabled is (-R4). The construction of this rule 
implies the announced property. 

We can conclude that m is delivered at most once to its destination, that proves the result. □ 

Theorem 1 SSMJ-Vi is a snap- stabilizing message forwarding protocol for SV ( even if routing 
tables are corrupted in the initial configuration) when A run simultaneously. 
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Proof. Proposition [2] and Lemma U] allows us to conclude that SSMJ-Vi is a snap-stabilizing 
message forwarding protocol for specification SV' even if routing tables are corrupted in the initial 
configuration on condition that A runs simultaneously. 

Then, using this remark and Lemma we obtain the result. □ 

4.4 Time complexities 

Since our algorithm needs a weakly fair daemon, there is no points to do an analysis in terms 
of steps. It is why all the following complexities analysis are given in rounds. Let Rj^ be the 
stabilization time of A in terms of rounds. 

In order to lighten this paper, we present only key ideas of this section proofs. 

Proposition 3 For any Processor d, SSMFVi delivers 2n invalid messages to d in the worst 
case. 

Sketch of proof. In the initial configuration, the system has at most In distinct invalid 
messages of Destination d (since the connected component of the buffer graph associated to d has 
2n buffers). In the worst case, all these invalid messages are delivered to their destination, that 
allows us to reach the announced bound. □ 

Proposition 4 In the worst case, a message m (of Destination d) needs 0(max(R_A, A D )) rounds 
to be delivered to d once it has been generated by its source. 

Sketch of proof. In a first time, we show by induction the following result: if 7 is a configuration 
in which routing tables are correct and C is a caterpillar of type 1 associated to a message m (of 
Destination d) on a processor p such as dist(p, d) = 5, then m is delivered to d or there exists a 
caterpillar of type 1 associated to m on nextHop p (d) in at most 0(A S ) rounds. This result is due 
to the fairness of choice p (d) which can allow at most A messages to "pass" m (see the proof of 
Lemma [I]). 

Then, consider that s is the source of a message m of Destination d. We have dist(s, d) < D by 



definition. We can conclude that m is delivered in at most Yl 0(A S ) € 0(A D ) rounds if routing 

5=D 

tables are correct when m is emitted. 

Finally, we can deduce the result when m is emitted in a configuration in which routing tables are 
not correct since the message is delivered in at most 0{A D ) rounds after routing tables computation 
(which takes at most 0{R_a) rounds if m is not delivered during the routing tables computation 
since we have assumed the priority of A over SSMJ-Vi ). □ 

Proposition 5 The delay (waiting time before the first emission) and the waiting time (between 
two consecutive emissions) of SSM.TV\ is 0(max(Rj[, A D )) rounds in the worst case. 

Sketch of proof. Let p be a processor which has a message of Destination d to emit. By the 
fairness of choice p (d), we can say that m is generated after at most (A — 1) releases of bufR p (d) 
(see proof of Lemma [1]). The result of Proposition U] allows us to say that bufR p (d) is released in 
0(max(R_A, A D )) rounds at worst. Indeed, we can deduce the result. □ 
The complexity obtained in Proposition |4] is due to the fact that the system delivers a huge 
quantity of messages during the forwarding of the considered message. It's why we interest now in 
the amortized complexity (in rounds) of our algorithm. For an execution T, this measure is equal 
to the number of rounds of T divided by the number of delivered messages during T (see [5] for a 
formal definition). 
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Proposition 6 The amortized complexity (to forward a message) of SShATV\ is 0(max (Ra,D)) 
rounds. 

Sketch of proof. In a first time, we must prove the following property: if 7 is a configuration 
in which at least one message of Destination d is present and in which routing tables are correct, 
then SSM.TV\ delivers at least one message to d in the 3D rounds following 7. 

The proof of this property is done as follows. Let 5 be the smallest number such that there 
exists a message of Destination d on a processor p which satisfy dist(p, d) = 5. Then, we prove 
that, after at most three rounds, there exists a message (not necessarily m) on a processor p' which 
satisfies dist(p',d) =5 — 1. Since 5 < D in 7, we obtain the announced property. 

Assume now an initial configuration in which routing tables are correct. Let T be one execution 
leads to the worst amortized complexity. Let i?r be the number of rounds of T. By the previous 
property, we can say that SSAAFVi delivers at least ^ messages during T. So, we have an 
amortized complexity of -jyp € ©(D). Then, the announced result is obvious. □ 

3D 

4.5 Conclusion 

In this section, we prove that we can adapt the "destination-based" deadlock-free controller defined 
in [21 j to obtain a snap-stabilizing message forwarding algorithm. Our algorithm is mainly based 
on the control of effects of routing tables moves on message. This control is performed in two ways. 
Firstly, we "slow down" messages by using two buffers per processor in order to control the number 
of copy of a same message in the network at a given time. Secondly, we use a specific flag to avoid 
message merge or duplication. 

The initial fault-free protocol uses n 2 buffers for the whole network and our protocol uses 2n 2 
buffers. Consequently, our protocol ensures a stronger safety and fault-tolerance with respect the 
initial one without a significant overcost in space. Our time analysis (see Section [4.4p shows that 
this stronger safety does not leads to an overcost in time. 

5 Second protocol 
5.1 Informal description 

In this section, we give a second snap-stabilizing message forwarding protocol adapted to the 
"distance-based" deadlock-free controller (see Section [3]). Our idea is to adapt this scheme in order 
to tolerate transient faults. To perform this goal, we assume the existence of a self-stabilizing 
silent (i.e. no actions are enabled after convergence) algorithm A to compute routing tables which 
runs simultaneously to our message forwarding protocol. Moreover, we assume that A has priority 
over our protocol (i.e. a processor which has enabled actions for both algorithms always chooses 
the action of A). This guarantees us that routing tables are correct and constant in a finite 
time. To simplify the presentation, we assume that A induces only minimal paths in number of 
edges. We assume that our protocol can have access to the routing table via a function, called 
nextHopp(d). This function returns the identity of the neighbor of p to which p must forward 
messages of Destination d. 

Our idea is as follows. We choose exactly the same graph buffer as |21| and we allow the erasing 
of a message only if we are assured that the message has been delivered to its destination. In this 
goal, we use an acknowledgment scheme which guarantees the reception of the message. 

More precisely, we associate to each copy of the message a type which has 3 values: S (Sending), 
A (Acknowledgment) and F (Fail). Forwarding of a valid message follows the above scheme: 
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1. Generation with type S in a buffer of rank 1. 

2. Forwarding (with copy in buffers of increasing rank) with type S without any erasing. 

3. If the message reaches its destination : 

(a) It is delivered and the copy of the message takes type A. 

(b) Type A is propagated to the sink of the message following the income path. 

(c) Buffers are allowed to free themselves once the type A is propagated to the previous 
buffer on the path. 

(d) The sink erases its copy, that performs the erasing of the message. 

4. Otherwise, (the message reaches a buffer of rank D + 1 without cross its destination) : 

(a) The copy of the message takes type F. 

(b) Type F is propagated to the sink of the message following the income path. 

(c) Buffers are allowed to free themselves once the type F is propagated to the previous 
buffer on the path. 

(d) Then, the sink of the message gives the type S to its copy, that begin a new cycle (the 
message is sending once again). 

Obviously, it is necessary to take in account invalid messages: we have chosen to let them follow 
the forwarding scheme and to erase them if they reach step 4.d. 

The key idea of the snap-stabilization of our algorithm is the following: since a valid message 
is never erased, it is sent again after the stabilization of routing tables (if it never reached its 
destination before) and it is then normally forwarded. 

To avoid livelocks, we use a fair scheme of selection of processors allowed to forward a message 
for each buffer. We can manage this fairness by a queue of requesting processors. Finally, we use a 
specific flag to prevent message losses. It is composed of the identity of the next processor on the 
path of the message, the identity of the last processor cross over by the message, the identity of 
the destination of the message and the type of the message (S, A or F). 

We must manage a communication between our algorithm and processors in order to know when 
a processor has a message to send. We have chosen to create a Boolean shared variable request p 
(for any processor p). Processor p can set it at true when it is at false and when p has a message to 
send. Otherwise, p must wait that our algorithm sets the shared variable to false (when a message 
is sent out). 

5.2 Algorithm 

We now present formally our protocol in Algorithm [2J We call it SSM.TV2 for 5nap-5tabilizing 
.Message forwarding Protocol 2. 

5.3 Proof of correctness 

In order to simplify the proof, we introduce a second specification of the problem. This specification 
allows message duplications. 

Specification 3 (SV') Specification of message forwarding problem allowing duplication. 
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Algorithm 2 SSMJ r T , 2 ■ Message forwarding protocol for processor p. 
Data: 

- n, D : natural numbers equal rcsp. to the number of processors and to the diameter of the network. 

- I = {0, n — 1} : set of processor identities of the network. 

- N p : set of neighbors of p. 
Message: 

- (to, r, q, d, c) with to useful information of the message, r £ N p identity of the next processor to cross for 
the message (when it reaches the node), q £ N p identity of the last processor cross over by the message, 
d £ J identity of the destination of the message, c £ {S, A, F} type of the message. 

Variables: 

- Vi £ {1, D + 1}, buf p (i) : buffer which can contain a message or be empty (denoted by e) 
Input / Output: 

- requestp : Boolean. The higher layer can set it to "true" when its value is "false" and when there is a 
waiting message. We consider that this waiting is blocking. 

- nextMeSp-. gives the message waiting in the higher layer. 

- nextDestp. gives the destination of nextMes p if it exists, null otherwise. 
Procedures: 

- nextHopp(d): neighbor of p given by the routing for Destination d (if d = p, we choose arbitrarily r £ N p ). 

- Vz £ {2, ...,D+ 1}, choice p (i): fairly chooses one of the processors which can send a message in buf p (i), i.e. 
choice p (d) satisfies predicate ((choice v {i) £ N p ) A (buf c h i ce (^(i — 1) = (m,p,q,d, S)) A (choice p (i) ^ d)). 
We can manage this fairness with a queue of length A + 1 of processors which satisfies the predicate. 

- deliver p (m): delivers the message to to the higher layer of p. 
Rules: 

/* Rules for the buffer of rank 1 */ 
/* Generation of messages */ 

(fii) :: requestp A (buf p (l) = s) A (nextDestp = d) A (nextMes p = to) A {buf nextHopp ( d )(2) ^ 
(m, r' ,p,d,c)) — ► buf p (l) :— (m,nextHop p (d),r,d, S) with r £ N p ; requestp := false 
/* Processing of acknowledgment */ 

(fi 2 ) :: (buf p (l) = (m,r,q,d,F)) A (d ± p) A (buf r (2) ? (to, r' ,p, d, F)) — > buf p {\) := 
(to, nextHop p (d),q, d, S) 

(R 3 ) :: (&u/„(l) = (m,r,q,d, A)) A (d + p) A (buf r {2) ? (m,r',p, d, A)) — » buf p {\) := e 

/* Management of messages which reach their destinations */ 

(FU) :: bufp(l) = (m,r,q,p,S) — > deliver ' p (m); buf p (l) := (m,r,q,p,A) 

(R 5 ) :: Kfp(l) = (m,r,g,p,A) — > buf p (l) := e 

(Re) :: buf p (l) = (m,r,q,p,F) — >buf p (l) := (m,r,q,p,S) 

/* Rule for buffers of rank 1 to D : propagation of acknowledgment */ 

(R 7 ) :: 3i £ {1, ...,£>}, ((buf p (i) = {m,r,q,d, S)) A (p ^ d) A (6u/ r (i + 1) = (to, r',p, d, c))A (c £ 
{R, A})) — > buf p (i) := (to, r, g, d, c) 

/* Rules for buffers of rank 2 to D */ 
/* Forwarding of messages */ 

(R s ) ■■ 3z £ {2, ...,D}, ((buf p (i) = e) A (choice p (i) = s) A (buf s (i - 1) = (m,p,q,d,S))/\ 
{buf nextHoPp (d){i + 1) ^ ("i, r,p, d, c))) — > &u/ P (i) := (m, nextHop p (d), s, d, 5) 
/* Erasing of messages of which the acknowledgment has been forwarded */ 

(fig) :: 3z £ {2,...,£>},((6u/ p (i) = {m,r,q,d,c)) A (c £ {F,A}) A (d ^ p) A (6u/,(i - 1) = 
(m,p,q',d,c)) A (fai/ r (i + 1) ^ (to, r',p, d, c)) — > 6u/ p (i) := e 

/* Rules for buffers of rank 2 to D + 1 */ 

/* Consumption of a message and generation of the acknowledgment A */ 

(R 10 ) :: 3i £ {2, ...,£> + l},buf p (i) = (m,r,q,p, S) — ► deliver p (m); buf p (i) := (m,r,q,p, A) 

/* Erasing of messages of destination p of which the acknowledgment has been forwarded */ 

(fin) :: 3i £ {2,...,D+l},((6u/ p (i) - (m, r, q,p, c)) A(c £ {F, A}) A(buf q (i — 1) = {m,p,q' ,p,c))) — > 

bufp(i) := e 
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End of Algorithm [2} 

/* Rules for the buffer of rank D + 1 */ 
/* Forwarding of messages */ 

(flia) :: (buf p (D + 1) = e) A (choice p {D + 1) = s) A (buf s (D) = (m,p,q,d,S)) — > buf p (D + 1) := 

(m, nextHop p (d), s, d, S) /* Generation of the acknowledgment F */ 

(His) :: (buf p (D + 1) = (m, r, g, d, S)) A (d + p) — > 6u/ p (D + 1) := (m, r, g, d, F) 

/* Erasing of messages of which the acknowledgment has been forwarded */ 

(H14) :: {buf p (D + 1) = (m,r,g,d,c)) A (c e {F, A}) A (d ^ p) A (buf q (D) = (m,p, g', d, c)) — > 
6u/ p (I» + 1) := e 

/* Correction rules: erasing of tail of abnormal caterpillars of type F,A (cf. definitions below) */ 

(R 15 ) :: 3^ e {2, D}, ((buf p (i) = (m, r, g, d, c)) A (c G {F, A}) A (buf r (i + 1) ^ (m, r',p, d, c)) A 

- 1) ^ (m,p,q',d,c'))) — > buf p (i) := e 
(Hi 6 ) :: ^ G {2, ...,£>}, ((6u/ p (i) = (m, r, g, d, c)) A (c G A (buf r (i + 1) ^ (ro,r / ,p,d,c)) A 

(6u/,(i-l) = (m,p,(?',d, C ')) A(c' G {F,A}\{c}Vq = d)) ^buf p (i) := s 

(Hit) :: (6u/ p (D+l) = (m, r, g, d, c)) A (c G {F, A}) A (6u/ g (D) 7^ (m,p, g', d, c')) — ■* 6u/ p (£> + l) := e 
(His) :: (buf p (D + 1) = (m, r, g, d, c)) A (c G {F, A}) A (6u/,(D) = (m,p, g', d, c')) A (c' G {F, ^4}\{c} V 
g = d) — ► buf p (D + l):=e 



• 74ny message can be send out in a finite time. 

• Any valid message is delivered to its destination in a finite time. 

In this section, we prove that SSM.J-V2 is a snap-stabilizing message forwarding protocol for 
specification SV. For that, we are going to prove successively that: 

1. Copies of a same message have a particular structure. Then, we prove some properties on the 
behavior of these structures under SSM.TV2 (Lemmas [6l [3 [H and [9]). 

2. SSM.TV2 is a snap-stabilizing message forwarding protocol for specification SV' if routing 
tables are correct in the initial configuration (Lemmas 1 101 lll[ 1121 and Proposition [7]). 

3. SSM.TV2 is a self-stabilizing message forwarding protocol for specification SV' even if routing 
tables are corrupted in the initial configuration (Proposition [8]). 

4. SSM.J-V2 is a snap-stabilizing message forwarding protocol for specification SV even if rout- 
ing tables are corrupted in the initial configuration (Lemmas [131 an d Theorem [2]) . 

In this proof, we consider that the notion of message is different from the notion of useful 
information. This implies that two messages with the same useful information sent by the same 
processor are considered as two different messages. We must prove that the algorithm does not 
loose one of them thanks to the use of the flag. 

Preliminaries. In a first time, we define a particular structure of messages and we study the 
behavior of these structure under SSM.TV2 ■ Let 7 be a configuration of the network. We say 
that a message m is existing in 7 if at least one buffer contains m in 7. 

Definition 4 (Caterpillar of a message m) Let m be a message of Destination d existing in a 
configuration 7. We define a caterpillar associated to m (noted C m ) as the longest sequence of 
buffers C m = buf pl (i)...buf Pt (i +t — 1) (with t > 1) which satisfies: 
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Figure 6: Examples of caterpillar (at left: abnormal of type A, at right: normal of type E). 

• Vj G {1, - 1}, pj / (i andp j+1 ^pj. 

• Vj G {1, ...,t}, buf Pj (i+j - 1) = (m,rj,qj,d,Cj). 

• Vj G {1, - 1}, rj =Pj+i- 

• Vj G {2, =Pj_i. 



3A; G {l,...,t + l}, 



Vj G {1, A; — 1}, Cj = S and 

(Vj G {A;, Cj = A)y (Vj G {fc, ...,*}, Cj = F) 



W^e call respectively buf pi (i), buf pt (i + t — 1), and lgc m = t the tail, the head, and the length of C m . 
We give now some characterization for caterpillars. 

Definition 5 (Characterization of caterpillar of a message m) Let m be a message of Des- 
tination d in a configuration 7 and C m = buf pi (i)...buf pt (i + t — l) (t > 1) a caterpillar associated 
to m. Then, 

• C m is a normal caterpillar if i = 1. It is abnormal otherwise (i>2). 

• C m is a caterpillar of type S ifVj G {1, ...,t), Cj = S (i.e. k = t + 1). 

• C m is a caterpillar of type A if3j G {1, ...,t}, Cj = A (i.e. k < t + 1). 

• C m is a caterpillar of type F ifBj G {1, ...,t}, Cj = F (i.e. k < t + 1). 

It is obvious that, for each caterpillar C m , either C m is normal or abnormal. In the same way, 
C m is only of type S, A or F. The reader can find in Figure [6] an example for some type of 
caterpillar. 

Lemma 6 Let 7 be a configuration and m be a message of Destination d existing in 7. Under a 
weakly fair daemon, every abnormal caterpillar of type F (resp. A) associated to m disappears in 
a finite time or become a normal caterpillar of type F (resp. A). 

Proof. Let 7 be a configuration of the network. Let m be an existing message (of Destination 
d) in 7. Let C m = buf Pl (i)...buf pt (i + t — 1) (t > 1 and i > 1) be a normal caterpillar of type F or 
A associated to m. Let c be the type of C m . 
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1. By definition of caterpiliar of type c, we have 1 < k < t. We can deduce that i + k — 2 < 
i + 1 — 1 < D + 1 and then (-R7) is enabled for p^-i- This rule can not be neutralized since 
Processor pk is not enabled by a rule affecting its buffer of rank i + k. As the daemon is 
weakly fair, pk-i executes these rule in a finite time. We can repeat this reasoning k — 1 times 
on Processors Pk~i, —,Pi- Then, we obtain a caterpillar which all buffers are on type c in a 
finite time. 

2. If t = 1, we can directly go to case 4. Otherwise (t > 2), we must distinguish the following 

cases: 

Case 1: p t = d. 

Processor pt is the enabled for rule (-R11) by definition of a caterpillar and the fact that 
all buffers of C m are of type c. Note that Processor pt-\ is not enabled. Consequently, 
this rule remains infinitely enabled for pt- Since the daemon is weakly fair, pt executes 
this rule in a finite time. Then, buf Pt (i + 1 — 1) is empty in a finite time. 

Case 2: p t ^ d. 



Case 2.1: i + t-\ = D + \. 

Then, Processor p t is enabled for rule (-R14) by definition of a caterpillar and the 
fact that all buffers of C m are of type c. Note that Processor p t -\ is not enabled. 
Consequently, this rule remains infinitely enabled for p t . Since the daemon is weakly 
fair, pt executes this rule in a finite time. Then, buf Pt (i + 1 — 1) is empty in a finite 
time. 

Case 2.2: i + t — \<D. 

Assume that buf Pt (i + t — 1) = (m, r, q, d, c). Then, Processor p t is enabled for rule 
(-R9) by definition of a caterpillar and the fact that all buffers of C m are of type 
c. Note that Processor pt-i is not enabled and that Processor r cannot forward a 
message (m,r' ,p t , d, c) in its buffer of rank i + t (since buf Pt (i + t — 1) is of type 
c / S). Consequently, this rule remains infinitely enabled for p t . Since the daemon 
is weakly fair, p t executes this rule in a finite time. Then, buf Pt (i + 1 — 1) is empty 
in a finite time. 

3. By following a reasoning similar to the one of case 2.2, we can prove that pt-i, ■■■,P2 executes 
(-R9) sequentially in a finite time 

4. Then, we obtain a caterpillar of type c of length 1 satisfying i > 1. Assume that buf pi (i) = 
(m, r, q, d, c). We can distinguish the following cases: 

Case 1: buf q (i — 1) = (m,pi, q', d, d). 
Case 1.1: q = d. 

By the definition of a caterpillar of type c of length 1 and the hypothesis, p\ is 
enabled for rule (-R16) (if i < D) or (Ris) (i£i = D + 1). By a reasoning similar to 
the one of case 2.2 above, these rule remains infinitely enabled. Since the daemon 
is weakly fair, p\ executes this rule in a finite time. Consequently, buf Pl (i) becomes 
empty in a finite time. Then, C m disappears. 
Case 1.2: q^d. 

Assume that c' = S. Then, buf q (i — 1) belongs to C m . This contradicts the fact 
that C m is of type c. Consequently, d £ {F, A}. 
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If d = c, then the execution of rule (-R7) by p\ leads to the merge of two caterpillars 
of type c. Then, consider the new caterpillar C' m = buf p > (i')...buf p t f (i' + t' — 1) (with 
bufpi (i' + 1' — 1) = buf Pl (i)). If i' = 1, then we have a normal caterpillar of type c. 
Otherwise, we can restart the reasoning (we are ensured that this reasoning is finite 
since we have 1 < i! < i at each step). 

Consider now the case d 7^ c. By definition of a caterpillar of type c of length 1 and 
the hypothesis, p\ is enabled by rule (-R16) (if i < D) or (Ris) (Hi = D + l). By a 
reasoning similar to the one of case 2.2 above, these rule remains infinitely enabled. 
Since the daemon is weakly fair, p\ executes this rule in a finite time. Consequently, 
buf Pl (i) becomes empty in a finite time. Then, C m disappears. 

Case 2: buf q (i — 1) 7^ (m,px,q',d,d). 

By definition of a caterpillar of type c of length 1 and the hypothesis, p\ is enabled by 
rule (-R15) (if i < D) or (-R17) (if i = D + 1). By a reasoning similar to the one of 
case 2.2 above, these rule remains infinitely enabled. Since the daemon is weakly fair, 
pi executes this rule in a finite time. Consequently, buf Pl (i) becomes empty in a finite 
time. Then, C m disappears. 

In all cases, C m disappears or becomes a normal caterpillar of type c in a finite time, that leads 
us to the lemma. □ 

Lemma 7 Let 7 be a configuration and m be a message of Destination d existing in 7. Under a 
weakly fair daemon, every normal caterpillar of type A associated to m disappears in a finite time. 

Proof. Let 7 be a configuration and m be a message of Destination d existing in 7. Let 
C m = buf Pl (l)...buf Pt (t) (t > 1) be a normal caterpillar of type A associated to m. We must 
distinguish the following cases: 

Case 1: t = 1. 

Case 1.1: p\ = d. 

Then, rule (-R5) is enabled for p\. Since the guard of this rule involves only local 
variables, it remains infinitely enabled. Since the daemon is weakly fair, p\ executes this 
rule in a finite time. Consequently, C m disappears. 

Case 1.2: p\ ^ d. 

By the definition of a caterpillar and the hypothesis, p\ is enabled by rule (-R3). By 
a reasoning similar to the one of the case 2.2.2 of the proof of Lemma [6j we can prove 
that this rule remains infinitely enabled. Since the daemon is weakly fair, p\ executes 
this rule in a finite time. Consequently, C m disappears. 

Case 2: t > 2. 

We can apply the reasoning of points 1,2, and 3 of the proof of Lemma [6l That leads us to 
case 1.2. 

In all the cases, C m disappears in a finite time, that leads us to the lemma. □ 

Lemma 8 Let 7 be a configuration and m be a message of Destination d existing in 7. Under a 
weakly fair daemon, every normal caterpillar of type F associated to m becomes a normal caterpillar 
of type S of length 1 in a finite time. 
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Proof. Let 7 be a configuration and m be a message of Destination d existing in 7. Let 
C m = buf pi (l)...buf pt (t) (t > 1) be a normal caterpillar of type F associated to m. We must 
distinguish the following cases: 

Case 1: t = l. 

Case 1.1: pi = d. 

Then, rule is enabled for p\. Since the guard of this rule involves only local 

variables, it remains infinitely enabled. Since the daemon is weakly fair, pi executes this 
rule in a finite time. Consequently, C m becomes a caterpillar of type S of length 1. 

Case 1.2: p 1 ^ d. 

By the definition of a caterpillar and the hypothesis, p± is enabled by rule (R?)- By 
a reasoning similar to the one of the case 2.2.2 of the proof of Lemma [6j we can prove 
that this rule remains infinitely enabled. Since the daemon is weakly fair, p\ executes 
this rule in a finite time. Consequently, C m becomes a caterpillar of type S of length 1. 

Case 2: t > 2. 

We can apply the reasoning of points 1,2, and 3 of the proof of Lemma [6l That leads us to 
case 1.2. 

In all cases, we proved that C m becomes a caterpillar of type S of length 1 in a finite time, that 
leads us to the lemma. □ 

Lemma 9 Let 7 be a configuration and m be a message of Destination d existing in 7. Under a 
weakly fair daemon, every caterpillar of type S associated to m becomes a caterpillar of type A or 
F in a finite time. 

Proof. Let 7 be a configuration of the network and m be a message (of Destination d) existing 
in 7. Let C m = buf Pl (i)...buf pt (i + t — 1) [t > 1) be a caterpillar of type S associated to m. 

We prove this result by a decreasing induction on the rank of the buffer occupied by the head 
of C m in 7. Let us define the following property: 

(Pi) : If C m satisfies i + 1 — 1 = I, then it becomes a caterpillar of type A or F in a finite time. 

Initialization: We want to prove that (.Pd_|_i) is true. 

Let C m = buf Pl (i)...buf pt (i + 1 — 1) (t > 1) be a caterpillar of type S associated to m such 
that i + t — \ = D + \. We must distinguish the following cases: 

Case 1: p t = d. 

By hypothesis, Processor pt is enabled for rule (.Rio)- Since the guard of this rule 
involves only local variables, it remains infinitely enabled. Since the daemon is weakly 
fair, p t executes this rule in a finite time. Consequently, buf pt (i + t — 1) becomes a buffer 
of type A and C m becomes a caterpillar of type A in a finite time. Then, Property 
(-Pd_|-i) is satisfied. 
Case 2: p t ^ d. 

By hypothesis, Processor p t is enabled for rule (-R13). Since the guard of this rule 
involves only local variables, it remains infinitely enabled. Since the daemon is weakly 
fair, p t executes this rule in a finite time. Consequently, buf pt (i + t — 1) becomes a buffer 
of type F and C m becomes a caterpillar of type F in a finite time. Then, Property 
(Pd_I_i) is satisfied. 
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Induction: Let be I < D. Assume that (Pi+i)...(Pe>+i) are satisfied. We want to prove that 
(Pi) is then satisfied. 

Let C m = buf Pl (i)...buf pt (i + 1 — 1) (t > 1) be a caterpillar of type 5" associated to m such 
that i + t — \ = I < D + 1. We must distinguish the following cases: 

Case 1: p t = d. 

Case 1.1: i + t- 1 = 1. 

By hypothesis, Processor p t is enabled for rule (-R4). Since the guard of this rule 
involves only local variables, it remains infinitely enabled. Since the daemon is 
weakly fair, p t executes this rule in a finite time. Consequently, buf pt (i + t — 1) 
becomes a buffer of type A and C m becomes a caterpillar of type A in a finite time. 
Then, Property (Pi) is satisfied. 
Case 1.2: 2 < i + t - 1 < D. 

These case is similar to the case 1 of initialization. Consequently, C m becomes a 
caterpillar of type A in a finite time. Then, Property (Pi) is satisfied. 

Case 2: p t 7^ d. 

Assume w.l.g. that buf Pt (i + t — 1) = (m, r, q, d, E). We want to prove that the head of 
C m goes up of one buffer in a finite time. We must study the following cases: 

Case 2.1: i + t = D + l. 

1. If buf r (i + 1) = s, then Processor r is enabled by rule (.f?i2)- Since Processor 
choice r (i + t) is not enabled, this rule remains infinitely enabled for r. Processor 
r executes this rule in a finite time because the daemon is weakly fair. The 
result of this execution depends on the value of choice r (i + t): 

(a) If choice r (i + t) = Pt, then the head of C m goes up of one buffer when r 
executes rule (i?i2)- 

(b) If choice r (i + 1) = s 7^ pt, then buf r (i + t) takes the value (m',r',s,d',c) when 
r executes rule (.Ri2)- This leads us to case 2.b. Note that the fairness of 
choice r (i + t) ensures us that these case can appear only a finite number of 
times. 

2. Consider now that buf r (i + t) = (in', r', q' , d', d). 

Assume that q' = pt and m! = m, then buf r (i+t) belongs to C m (the type of C m 
is then identical to the one of buf r (i + t)). Consequently, we have a contradiction 
with the definition of C m . This implies that q' 7^ pt or m! 7^ m. Let C m > be the 
caterpillar whose buf r (i + t) belongs. Consider the three possible cases: 

(a) C m i is of type S: we can apply the induction hypothesis to C m > since its head 
stays in a buffer of rank greater or equals to i + t. Consequently, C m > becomes 
a caterpillar of type F or A in a finite time. That leads us to one of the 
following cases. 

(b) C m i is of type A: following Lemmas [6] and [71 C m i disappears in a finite time. 
Then, buf r (i + t) becomes empty. That leads us to point 1. 

(c) C m > is of type F: following Lemmas [6] and El C m ' disappears or becomes a 
caterpillar of type S and length 1 in a finite time. In all cases, buf r (i + t) 
becomes empty (since i + t = D + l>2). That leads us to point 1. 
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Case 2.2: 2 < i + t < D. 

Consider the following cases: 

1. buf r (i + £)=£. 

Assume w.l.g. that s = choice r (i + t) and buf s (i + t — 1) = (m', r, q', df, d). By 
the construction of rule (Rs) an d the definition of a caterpillar, r is enabled if 
and only if buf nex tHop r (d')(i + £ + 1) is not the tail of an abnormal caterpillar 
C m i associated to m! . Let us study the following cases: 

(a) C m i is of type S: we can apply the induction hypothesis to C m > since its head 
stays in a buffer of rank greater or equals to % + t + 1. Consequently, C m i 
becomes a caterpillar of type F or A in a finite time. That leads us to one of 
the following cases. 

(b) C m i is of type A: following Lemma El C m i disappears in a finite time. Then, 
buf nextHopr ^)(i + t + l) becomes empty. 

(c) C m i is of type F: following Lemma EJ G m > disappears in a finite time (it 
cannot become a caterpillar of type S and length 1 since buf r {i + t) = e). 
Consequently, buf nextHopr ^^(i + t + 1) becomes empty in a finite time. 

Then, Rule (-Rs) is enabled for r in a finite time. This rule remains in- 
finitely enabled since no message of type (m" ,r' ,r,d" , c") can be copied in 
bufnextHopJd')^ + * + 1) (indeed, the contrary implies that nextHop r (d') ex- 
ecutes rule (Rs) whereas buf r (i + t) = e). Since the daemon is weakly fair, r 
executes rule (Rs) in a finite time. The result of this execution is one of the 
following: 

(a) If choice r (i + t) = Pt, then the head of C m goes up of one buffer when r 
executes rule (Rs)- 

(b) If choice r (i + 1) = s ^ pt, then buf r (i + t) takes the value (m', r', s, d', c) when 
r executes rule (-Rs)- This situation is similar to the one of point 2 below. 
Note that the fairness of choice r (i + 1) ensures us that these case can appear 
only a finite number of times. 

2. If buf r (i + t) = (m', r', q' , d' , c'), the reasoning is similar to the one of point 2 of 
case 2.1. Consequently, that leads us to point 1 in a finite time. 

In conclusion of case 2 (pt ^ d), the head of C m goes up of one buffer in a finite time. 
Then, the induction hypothesis allows us to state that C m becomes a caterpillar of 
type F or A in a finite time. Consequently, (Pi) is satisfied. 

□ 

Snap-stabilization when routing tables are correct in the initial configuration. Now, we 
assume that routing tables are correct in the initial configurations and we prove that SSM.FV2 is 
a snap-stabilizing algorithm for specification SV' . 

Lemma 10 Let 7 be a configuration in which routing tables are correct and m be a message of 
Destination d existing in 7. Under a weakly fair daemon, every normal caterpillar of type S 
associated to m becomes a caterpillar of type A in a finite time. 

Proof. Let 7 be a configuration of the network in which routing tables are correct and m 
be a message (of Destination d) existing in 7. Let C m = buf Pl (l)...buf Pt (t) (t > 1) be a normal 
caterpillar of type S associated to m. 
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By Lemma [9j C m becomes a caterpillar of type A or F in a finite time. In the first case, the 
proof ends here. In the second case (which is possible if D + 1 — t < d(pt,d) in 7), it follows by 
Lemma [8] that C m becomes a caterpillar of type S of length 1 in a finite time. Then, we have: 
C m = buf pi (l). 

Following Lemma C m becomes a caterpillar of type F or A in a finite time. Assume that 
C m becomes a caterpillar of type F. This implies that m have been forwarded D times without 
reach its destination. This result is absurd since we have by definition that dist{p\,d) < D and we 
assumed that routing tables are correct and constant. Consequently, C m becomes a caterpillar of 
type A in a finite time. □ 

Lemma 11 If routing tables are correct, every processor can generate a first message (i.e. it can 
execute (Ri) ) in a finite time under a weakly fair daemon . 

Proof. Let p be a processor of the network which have a message m (of Destination d) to 
forward. As p have a waiting message, the higher layer put request p = true whatever its value in 
the initial configuration. 

Assume that buf p (\) already contains a message. Let C m be the caterpillar which contains this 
buffer. We must distinguish the following cases: 

Case 1: C m is of type F. Following Lemma EJ C m becomes a caterpillar of type S in a finite time. 
That leads us to case 2. 

Case 2: C m is of type S. Following Lemma [TUl C m becomes a caterpillar of type A in a finite 
time. That leads us to case 3. 

Case 3: C m is of type A. Following Lemma [71 C m disappears in a finite time. 

In all cases, we obtain that buf p (l) becomes empty in a finite time. It remains empty while p 
does not execute rule (-Ri) (since it is the only rule which can put a message in this buffer). In 
these case, (-Ri) is enabled for p if and only if buf next U(^, (d)(2) 7^ (m,r',p,d,c). 

Assume that this condition is not satisfied. This implies (by definition of a caterpillar) that 
buf next Hop p (d)(2) is the tail of an abnormal caterpillar C' m . Following sequentially Lemmas [9] and 
C' m disappear in a finite time (note that the merge with buf p {\) is impossible since this buffer 
is empty). Moreover, buf next f{ opp ^(2) can not be fill by a message of type (m,r',p,d,c) (since 
bufp(l) is empty). Consequently, rule (.f?i) is infinitely enabled for Processor p. As the daemon is 
weakly fair, p executes this rule in a finite time, that leads to the lemma. □ 

Lemma 12 If a message m is generated by SSM.TV2 in a configuration in which routing tables 
are correct, SSMJ-V2 delivers m to its destination in a finite time under a weakly fair daemon. 

Proof. The generation of a message m (of Destination d) by SSMJ-T'2 results from the 
execution of rule (.f?i) by the processor which sends m. This rule creates a normal caterpillar of 
type S associated to m. Following Lemma [TUl this caterpillar becomes a caterpillar of type A in a 
finite time. It is due to the execution of rule (-R4) or (Rio) by d. These rules delivers the message 
to the higher layer of d, that ends the proof. □ 

Proposition 7 SSM.TV2 is a snap-stabilizing message forwarding protocol for SV 1 if routing 
tables are correct in the initial configuration. 
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Proof. Assume that routing tables are correct in the initial configuration. To prove that 
SSMJ-V2 is a snap-stabilizing message forwarding protocol for specification SV', we must prove 
that : 

1. If a processor p requests to send a message, then the protocol is initiated by at least one 
starting action on p in a finite time. In our case, the starting action is the execution of (Ri). 
Lemma [TT] proves this property. 

2. After a starting action, the protocol is executed according to SV'. If we consider that (-Ri) 
have been executed at least one time, we can prove that: 

• The first property of SV' is always satisfied (following Lemma [TT] and the fact that the 
waiting for the sending of new messages is blocking). 

• The second property of SV' is always satisfied (following Lemma fT2j) . 
Consequently, we deduce the proposition. □ 

Self-stabilization. Now, we assume that routing tables are corrupted in the initial configurations 
and we prove that SSMJ-V2 is a self-stabilizing algorithm for specification SV' . 

Proposition 8 SSM.TV2 is a self-stabilizing message forwarding protocol for SV' even if routing 
tables are corrupted in the initial configuration when A runs simultaneously. 

Proof. Remind that A is a self-stabilizing silent algorithm for computing routing tables running 
simultaneously to SSMTV2 • Moreover, we assumed that A has priority over SSMTV2 (i-e. a 
processor which have enabled actions for both algorithms always chooses the action of A). This 
guarantees us that routing tables are correct and constant in a finite time regardless of their initial 
states. 

By Proposition [71 SSMTV2 is a snap-stabilizing message forwarding protocol for specification 
SV' when it starts from a such configuration. Consequently, we can conclude on the proposition. 
□ 

Snap-stabilization. We still assume that routing tables are corrupted in the initial configuration 
and we prove that SSM.TV2 is a snap-stabilizing algorithm for specification SV. 

Lemma 13 Under a weakly fair daemon, SSM.TV2 does not delete a valid message without de- 
livering it to its destination even if A runs simultaneously. 

Proof. When SSMJ-V2 accepts a new valid message m, the processor which sends m executes 
rule By construction of the rule, this execution creates a normal caterpillar C m of type S 

associated to m. 

While m is not delivered to its destination, we know, by Lemmas [9] and El that C m follows 
infinitely often the above cycle: 

• C m is of type S and becomes of type F (type A is impossible since m is not delivered). 

• C m is of type F and becomes of type S. 

This implies that there always exists at least one copy of m in buf p (l) (if p is the sending 
processor of m). Then, this message is not deleted without being delivered to its destination. □ 



28 



Lemma 14 Under a weakly fair daemon, SSM.TV2 never duplicates a valid message even if A 
works simultaneously. 

Proof. It is obvious that the emission of a message m by rule (.Ri) only creates one caterpillar 
of type S associated to m. 

Then, observe that all rules are designed to obtain the following property: if a caterpillar has 
one head in a configuration, it also has one head in the following configuration whatever rules have 
been applied. Indeed, this property is ensured by the fact that the next processor on the path of a 
message m is computed (and put in the second field on the message) when m is copied into a buffer 
bufp{i) (not when it is forwarded to a neighbor). Consequently, if there is a routing table move 
after the copy of m in buf p (i), the caterpillar does not fork. The head of the caterpillar remains 
unique. 

We can conclude that, for any valid message m, there always exists a unique caterpillar C m 
associated to m. Assume that m is delivered. By construction of rules (-R4) and (i?io), C m 
becomes of type A. Following Lemma C m disappears in a finite time. Consequently, m cannot 
be delivered several times. □ 

Theorem 2 SSMJ-V2 is a snap-stabilizing message forwarding protocol for SV even if routing 
tables are corrupted in the initial configuration when A runs simultaneously. 

Proof. Proposition [8] and Lemma [T3l allows us to conclude that SSA4J-V2 is a snap-stabilizing 
message forwarding protocol for specification SV' even if routing tables are corrupted in the initial 
configuration on condition that A runs simultaneously. 

Then, using this remark and Lemma [T4l we obtain the result. □ 



5.4 Time complexities 

Since our algorithm needs a weakly fair daemon, there is no points to do an analysis in terms 
of steps. It is why all the following complexities analysis are given in rounds. Let be the 
stabilization time of A in terms of rounds. 

In order to lighten this paper, we present only key ideas of this section proofs. 

Proposition 9 In the worst case, Q{nD) invalid messages are delivered to Processor d. 

Sketch of proof. In the initial configuration, the system has at most n(D + 1) distinct invalid 
messages of Destination d. Then, the number of invalid messages deliver to d is in 0{nD). 

We can obtain the lower boundwith a chain of n = 2q + 1 processors labeled Px,P2, ■■■,Pn- 
Assume that all buffers of rank least or equals to q + 1 initially contain a message of destination 
Pq+i and other buffers are empty. Moreover, assume that routing tables are initially correct. Then, 
SSMJ-V2 delivers all invalid messages of this initial configuration top q ^\. This initial configuration 
contains n(q + 1) = n(y + 1) G Q(nD) invalid messages. The result follows. □ 

Proposition 10 In the worst case, a message m (of Destination d) needs 0(max(R_A,nDA D )) 
rounds to be delivered to d once it has been sent out by its source. 

Sketch of proof. In a first time, one must prove by induction the following fact: if 7 is a 
configuration in which routing tables are correct and in which a message of Destination d exists 
and C m is a caterpillar of type S associated to m which head is a buffer of rank 1 < i+t — 1 < D + l 
on p d, then the head of C m goes up of one buffer in at most 0(A D+1 ~( t+t ~ 1 ' 1 ) round if there 
exists no abnormal caterpillar whose tail is a buffer of rank greater than i + t. 
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In a second time, it is possible to show that C, the set of abnormal caterpillars in 7 looses at 
least one element during the 0(A D ) rounds which follow 7. Then, we can say that, when routing 
tables are correct, an accepted message is forwarded in at most 0(nDA D ) rounds. 

Finally, we can deduce the result when m is emitted in a configuration in which routing tables 
are not correct since the message is delivered in at most 0(nDA D ) rounds after routing tables 
computation (which takes at most 0(R_a) rounds if m is not delivered during the routing tables 
computation since we have assumed the priority of A). □ 

Proposition 11 The delay (waiting time before the first emission) and the waiting time (between 
two consecutive emissions) of SSM.TV2 is 0(max(R^,nDA D )) rounds in the worst case. 

Sketch of proof. Let p be a processor which has a message of Destination d to emit. By the 
fairness of choice p (d), we can say that m is sent after at most (A — 1) releases of buf p (l). The 
result of Proposition [TUl allows us to say that buf p (l) is released in 0(max(Rj^,nD A D )) rounds at 
worst. Indeed, we can deduce the result. □ 
The complexity obtained in Proposition [10] is due to the fact that the system delivers a huge 
quantity of messages during the forwarding of the considered message. It's why we interest now in 
the amortized complexity (in rounds) of our algorithm. For an execution T, this measure is equal 
to the number of rounds of V divided by the number of delivered messages during T (see [5] for a 
formal definition). 

Proposition 12 The amortized complexity (to forward a message) of SSMTVi is in 0(max 
(Ra,D)) rounds when there exists no invalid messages. 

Sketch of proof. In a first time, we must prove the following property: if 7 is a configuration 
in which at least one caterpillar of type S is present, routing tables are correct, and there exists no 
invalid messages, then SSM.TV2 delivers at least one message to a processor in the 3D + 1 rounds 
following 7. 

Assume now an initial configuration in which routing tables are correct and in which there 
exists no invalid messages. Let T be one execution which leads to the worst amortized complexity. 
Let Rr be the number of rounds of T. By the last remark, we can say that SSAiTV? delivers at 
least jjj^n messages during T. So, we have an amortized complexity of ^ G 0(D). Then, the 

SD+l 

announced result is obvious. □ 
5.5 Conclusion 

In this section, we prove that we can adapt the "distance-based" deadlock-free controller defined in 
|21j to obtain a snap-stabilizing message forwarding algorithm. Our algorithm is mainly based on 
an acknowledgement scheme. Each message is re-emitted until it reaches its destination. As routing 
tables stabilize in a finite time, we are ensured that, in the worst case, the message is re-emitted 
after the end of computation of routing tables. Hence, it can reach its destination normally. 

The initial fault-free protocol uses n(D + 1) buffers for the whole network and our protocol 
uses exactly the same number of buffers. Consequently, our protocol ensures a stronger safety and 
fault-tolerance with respect the initial one without overcost in space. Our time analysis (see Section 
15. 4p shows that this stronger safety does not leads to an overcost in time. 



30 



6 Conclusion 



In this paper, we provide the first algorithms (at our knowledge) to solve the message forwarding 
problem in a snap-stabilizing way (when a self-stabilizing algorithm for computing routing tables 
runs simultaneously) for a specification which forbids message losses and duplication. This property 
implies the following fact: our protocol can forward any emitted message to its destination regardless 
of the state of routing tables in the initial configuration. Such an algorithm allows the processors of 
the network to send messages to other without waiting for the routing table computation. We use a 
tool called "buffer graph" which has been introduced in [21]. This paper proposed an adaptation of 
two "buffer graphs" in order to control the effect of routing table moves on messages. Our analysis 
shows that we ensure snap-stabilization without significant overcost in space or in time with respect 
to the fault-free algorithm. 

[2T] also proposed other buffer graphs. So, it is natural to wonder if they could be adapted to 
tolerate transient faults. In particular, one of them (based on the acyclic covering of the network, 
see also [24J) is very interesting since it needs less buffers per processor in general (3 for a ring, 
2 for a tree...). But, authors of [19] show that it is NP-hard to compute the size of the acyclic 
covering of any graph. So, this buffer graph cannot be easily applied to any network. A very 
important open problem is the following: what is the minimal number of buffers per processor to 
allow snap-stabilization on the message forwarding problem ? 

Another way to improve our protocol is to speed up the message forwarding in the worst case 
(without increasing amortized complexity). In this goal, we believe that we can keep our protocol 
and modify the fair scheme of selection of messages choice p (d) . In fact, the complexity of our 
algorithm depends on the number of messages which can "pass" a specific message at each hop. 

Our protocol has the following drawback: when a message m is delivered to a processor p, p 
cannot determine if m is valid or not. This can bring some problems for applications which use 
these messages. So, an interesting way of future researches could be to design a protocol which 
solves this problem. In [6] the authors propose an efficient solution for the PIF problem that deals 
with a similar problem, unfortunately their approach does not seem suitable for our problem. 

Finally, it would be interesting to carry our protocol in the message passing model (a more 
realistic model of distributed system) in order to enable snap-stabilizing message forwarding in a 
real network. To our knowledge, in this model, only two snap-stabilizing protocols exist in the 
literature (0 [H]). The problem to carry automatically a protocol from the state model to the 
message passing model is still open. 
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